{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "uxCkB_DXTHzf"
      },
      "outputs": [],
      "source": [
        "# Copyright 2024 Google LLC\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Hny4I-ODTIS6"
      },
      "source": [
        "# Create High Quality Visual Assets with Imagen and Gemini\n",
        "\n",
        "<table align=\"left\">\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://colab.research.google.com/github/GoogleCloudPlatform/generative-ai/blob/main/vision/use-cases/creating_high_quality_visual_assets_with_gemini_and_imagen.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://www.gstatic.com/pantheon/images/bigquery/welcome_page/colab-logo.svg\" alt=\"Google Colaboratory logo\"><br> Run in Colab\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/colab/import/https:%2F%2Fraw.githubusercontent.com%2FGoogleCloudPlatform%2Fgenerative-ai%2Fmain%2Fvision%2Fuse-cases%2Fcreating_high_quality_visual_assets_with_gemini_and_imagen.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN\" alt=\"Google Cloud Colab Enterprise logo\"><br> Run in Colab Enterprise\n",
        "    </a>\n",
        "  </td>    \n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/use-cases/creating_high_quality_visual_assets_with_gemini_and_imagen.ipynb\">\n",
        "      <img width=\"32px\" src=\"https://raw.githubusercontent.com/primer/octicons/refs/heads/main/icons/mark-github-24.svg\" alt=\"GitHub logo\"><br> View on GitHub\n",
        "    </a>\n",
        "  </td>\n",
        "  <td style=\"text-align: center\">\n",
        "    <a href=\"https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/GoogleCloudPlatform/generative-ai/main/vision/use-cases/creating_high_quality_visual_assets_with_gemini_and_imagen.ipynb\">\n",
        "      <img src=\"https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32\" alt=\"Vertex AI logo\"><br> Open in Vertex AI Workbench\n",
        "    </a>\n",
        "  </td>\n",
        "</table>\n",
        "\n",
        "<div style=\"clear: both;\"></div>\n",
        "\n",
        "<b>Share to:</b>\n",
        "\n",
        "<a href=\"https://www.linkedin.com/sharing/share-offsite/?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/use-cases/creating_high_quality_visual_assets_with_gemini_and_imagen.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/8/81/LinkedIn_icon.svg\" alt=\"LinkedIn logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://bsky.app/intent/compose?text=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/use-cases/creating_high_quality_visual_assets_with_gemini_and_imagen.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg\" alt=\"Bluesky logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://twitter.com/intent/tweet?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/use-cases/creating_high_quality_visual_assets_with_gemini_and_imagen.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/5a/X_icon_2.svg\" alt=\"X logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://reddit.com/submit?url=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/use-cases/creating_high_quality_visual_assets_with_gemini_and_imagen.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://redditinc.com/hubfs/Reddit%20Inc/Brand/Reddit_Logo.png\" alt=\"Reddit logo\">\n",
        "</a>\n",
        "\n",
        "<a href=\"https://www.facebook.com/sharer/sharer.php?u=https%3A//github.com/GoogleCloudPlatform/generative-ai/blob/main/vision/use-cases/creating_high_quality_visual_assets_with_gemini_and_imagen.ipynb\" target=\"_blank\">\n",
        "  <img width=\"20px\" src=\"https://upload.wikimedia.org/wikipedia/commons/5/51/Facebook_f_logo_%282019%29.svg\" alt=\"Facebook logo\">\n",
        "</a>            \n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "WYmxuX-wtDxO"
      },
      "source": [
        "| | |\n",
        "|-|-|\n",
        "|Author(s) | [Thu Ya Kyaw](https://github.com/iamthuya) |"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-nLS57E2TO5y"
      },
      "source": [
        "## Overview\n",
        "\n",
        "[Imagen on Vertex AI](https://cloud.google.com/vertex-ai/docs/generative-ai/image/overview) lets developers quickly generate high-quality images from simple text descriptions. Build and edit innovative AI-powered imagery with ease."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iXsvgIuwTPZw"
      },
      "source": [
        "### Objectives\n",
        "\n",
        "In this notebook, you will create high quality visual assets for a restaurant menu using Imagen and Gemini. You will:\n",
        "\n",
        "- Generate an image prompt with Gemini\n",
        "- Use Imagen to create high quality images using prompts\n",
        "- Implement a short pipeline to produce highly-detailed visual assets"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mvKl-BtQTRiQ"
      },
      "source": [
        "## Get started"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PwFMpIMrTV_4"
      },
      "source": [
        "### Install Google Gen AI SDK for Python"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "WYUu8VMdJs3V"
      },
      "outputs": [],
      "source": [
        "%pip install --upgrade --quiet google-genai"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "opUxT_k5TdgP"
      },
      "source": [
        "### Authenticate your notebook environment (Colab only)\n",
        "\n",
        "If you are running this notebook on Google Colab, run the following cell to authenticate your environment."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "vbNgv4q1T2Mi"
      },
      "outputs": [],
      "source": [
        "import sys\n",
        "\n",
        "if \"google.colab\" in sys.modules:\n",
        "    from google.colab import auth\n",
        "\n",
        "    auth.authenticate_user()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "83SC2U07wWwe"
      },
      "source": [
        "### Import libraries"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "id": "ViFhnm7vwTjD"
      },
      "outputs": [],
      "source": [
        "from google import genai\n",
        "from google.genai import types"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ybBXSukZkgjg"
      },
      "source": [
        "### Set Google Cloud project information and create client\n",
        "\n",
        "To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).\n",
        "\n",
        "Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "id": "5gUjJ42Nh5kf"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "\n",
        "PROJECT_ID = \"[your-project-id]\"  # @param {type: \"string\", placeholder: \"[your-project-id]\", isTemplate: true}\n",
        "if not PROJECT_ID or PROJECT_ID == \"[your-project-id]\":\n",
        "    PROJECT_ID = str(os.environ.get(\"GOOGLE_CLOUD_PROJECT\"))\n",
        "\n",
        "LOCATION = os.environ.get(\"GOOGLE_CLOUD_REGION\", \"us-central1\")\n",
        "\n",
        "client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "i-6H9Ccq9z8-"
      },
      "source": [
        "## Image Generation"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "lhfneknwEDHT"
      },
      "source": [
        "### Load the image generation model\n",
        "\n",
        "The model names from Vertex AI Imagen have two components:\n",
        "* Model name\n",
        "* Version number\n",
        "\n",
        "For example, `imagen-3.0-generate-002` represents the **002** version of the **imagen-3.0-generate** model.\n",
        "\n",
        "`imagen-3.0-generate-002` is also known as [Imagen 3](https://cloud.google.com/vertex-ai/generative-ai/docs/image/overview).\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "id": "nEKPNLNL5RhD"
      },
      "outputs": [],
      "source": [
        "imagen_model = \"imagen-3.0-generate-002\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qLZagQ8NUDiB"
      },
      "source": [
        "### Generate your first image\n",
        "\n",
        "The `generate_image` function is used to generate images. All you need to input is a simple text prompt."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "0GYBwQuciCco"
      },
      "outputs": [],
      "source": [
        "image_prompt = \"A delicious pizza\"\n",
        "response = client.models.generate_images(\n",
        "    model=imagen_model,\n",
        "    prompt=image_prompt,\n",
        ")\n",
        "\n",
        "response.generated_images[0].image.show()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hkUso32D6tIQ"
      },
      "source": [
        "### Generating more than one image\n",
        "\n",
        "You can currently generate up to **4** images at a time with Imagen. Imagen provides several variations based on your prompt.\n",
        "\n",
        "You will do that in the cell below. An auxiliary function to display images in grid is also provided."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "id": "OK5nKWjP5jQh"
      },
      "outputs": [],
      "source": [
        "import math\n",
        "\n",
        "import matplotlib.pyplot as plt\n",
        "\n",
        "\n",
        "# An auxiliary function to display images in grid\n",
        "def display_images_in_grid(images):\n",
        "    nrows = math.ceil(len(images) / 4)  # Display at most 4 images per row\n",
        "    ncols = min(len(images) + 1, 4)  # Adjust columns based on the number of images\n",
        "\n",
        "    # Create a figure and axes for the grid layout.\n",
        "    fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(12, 6))\n",
        "\n",
        "    for i, ax in enumerate(axes.flat):\n",
        "        if i < len(images):\n",
        "            ax.imshow(images[i].image._pil_image)\n",
        "            ax.set_aspect(\"equal\")\n",
        "            ax.set_xticks([])\n",
        "            ax.set_yticks([])\n",
        "        else:\n",
        "            ax.axis(\"off\")\n",
        "\n",
        "    plt.tight_layout()\n",
        "    plt.show()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "bTjyMt_gBkOz"
      },
      "outputs": [],
      "source": [
        "image_prompt = \"delicious cupcakes\"\n",
        "response = client.models.generate_images(\n",
        "    model=imagen_model,\n",
        "    prompt=image_prompt,\n",
        "    config=types.GenerateImagesConfig(\n",
        "        number_of_images=4,\n",
        "    ),\n",
        ")\n",
        "\n",
        "display_images_in_grid(response.generated_images)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ZgbqGqAmksKE"
      },
      "source": [
        "### Load the Gemini model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "id": "7fK44QsNkrUw"
      },
      "outputs": [],
      "source": [
        "gemini_model = \"gemini-2.0-flash-001\""
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yAJKZArKQUwm"
      },
      "source": [
        "### Use Gemini to generate text content\n",
        "\n",
        "The `generate_content` function can be used to generate content with Gemini 2.0 Flash model. You just need to provide a simple textual prompt."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "UA7TfcQafPHB"
      },
      "outputs": [],
      "source": [
        "from IPython.display import Markdown, display\n",
        "\n",
        "# Provide text prompt and invoke generate_content\n",
        "text_prompt = \"What are the steps to open a restaurant?\"\n",
        "\n",
        "response = client.models.generate_content(\n",
        "    model=gemini_model,\n",
        "    contents=text_prompt,\n",
        ")\n",
        "\n",
        "display(Markdown(response.text))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "KQgOkmtrfXR2"
      },
      "source": [
        "To improve the user experience and reproducibility, you will define a generation config and create a function to bootstrap content generation with Gemini 2.0 Flash."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "id": "8jDBQGbAxc4g"
      },
      "outputs": [],
      "source": [
        "# Provide text prompt and invoke generate_content\n",
        "\n",
        "\n",
        "def generate_content(prompt):\n",
        "    # Define generation config to improve reproducibility\n",
        "    generation_config = types.GenerateContentConfig(\n",
        "        temperature=0.5,\n",
        "        top_p=0.8,\n",
        "        top_k=10,\n",
        "        candidate_count=1,\n",
        "        max_output_tokens=1024,\n",
        "    )\n",
        "\n",
        "    responses = client.models.generate_content(\n",
        "        model=gemini_model,\n",
        "        contents=text_prompt,\n",
        "        config=generation_config,\n",
        "    )\n",
        "\n",
        "    return responses.text"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "we3xMomwWn40"
      },
      "outputs": [],
      "source": [
        "text_prompt = \"What are the steps to open a restaurant?\"\n",
        "response = generate_content(text_prompt)\n",
        "\n",
        "display(Markdown(response))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QxHYe8DnXAnP"
      },
      "source": [
        "### Generate a restaurant menu with Gemini 2.0 Flash\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "cPBXu12aPrRO"
      },
      "outputs": [],
      "source": [
        "text_prompt = (\n",
        "    \"Provide a menu for an Italian restaurant. Give each menu item a brief description.\"\n",
        ")\n",
        "response = generate_content(text_prompt)\n",
        "\n",
        "display(Markdown(response))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "YlEB7Y5GWkvo"
      },
      "source": [
        "### Improve an existing image prompt with Gemini 2.0 Flash\n",
        "\n",
        "Here you'll use the image prompt technique of including a **style**, a **subject**, and a **context / background**."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Iz-oQe8mWAyh"
      },
      "outputs": [],
      "source": [
        "image_prompt = \"A delicious pizza\"\n",
        "\n",
        "prompt_template = \"\"\"\n",
        "  Rewrite \"{image_prompt}\" into an image prompt.\n",
        "  For example: A sketch of a modern apartment building surrounded by skyscrapers.\n",
        "  \"A sketch\" is a style.\n",
        "  \"A modern apartment building\" is a subject.\n",
        "  \"Surrounded by skyscrapers\" is a context and background.\n",
        "\n",
        "  Here are a few \"styles\" to get inspiration from:\n",
        "  - A studio photo\n",
        "  - A professional photo\n",
        "\n",
        "  Here are a few \"context and background\" to inspiration from:\n",
        "  - In a kitchen on a wooden surface with natural lighting\n",
        "  - On a marble counter top with studio lighting\n",
        "  - In an Italian restaurant\n",
        "\n",
        "  The final rewritten prompt should be a single sentence.\n",
        "\"\"\"\n",
        "\n",
        "text_prompt = prompt_template.format(image_prompt=image_prompt)\n",
        "rewritten_image_prompt = generate_content(text_prompt)\n",
        "\n",
        "print(f\"PROMPT: {text_prompt}\")\n",
        "print(f\"RESPONSE: \\n  {rewritten_image_prompt}\")\n",
        "\n",
        "\n",
        "response = client.models.generate_images(\n",
        "    model=imagen_model,\n",
        "    prompt=rewritten_image_prompt,\n",
        "    config=types.GenerateImagesConfig(\n",
        "        number_of_images=4,\n",
        "    ),\n",
        ")\n",
        "\n",
        "display_images_in_grid(response.generated_images)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AVciZj3QVs8Y"
      },
      "source": [
        "## Visual asset pipeline\n",
        "\n",
        "Now that you have seen Gemini 2.0 Flash's capabilities to create a complete restaurant menu and how it can enhance the quality of image prompts, the next step is to establish a formal asset pipeline that leverages these abilities."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "k8Yv6bZDTCuE"
      },
      "source": [
        "### Standardize the output as JSON format\n",
        "\n",
        "In the previous attempts, Gemini 2.0 Flash returned either in Markdown or plaintext responses, which made it difficult to integrate with further steps.\n",
        "\n",
        "To solve this, we'll ask that Gemini standardize the response in JSON format. This will make the response easier to process and integrate downstream."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5NvxrAWybH9K"
      },
      "outputs": [],
      "source": [
        "text_prompt = \"\"\"\n",
        "  Provide a menu for an Italian restaurant in a JSON format.\n",
        "  Each item in the menu should have a name and a description.\n",
        "  The item description should contain the ingredients and how the item was prepared.\n",
        "  Don't include \"of the day\" items such as \"soup of the day\".\n",
        "\n",
        "  The parent fields should be starters, main courses, desserts, and drinks.\n",
        "  Parent fields should be lower cased.\n",
        "  The child fields should be name and description.\n",
        "  Do not include JSON decorator. The response should start with an opening curly brace.\n",
        "  \"\"\"\n",
        "response = generate_content(text_prompt)\n",
        "print(response)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "DE95LHFUcR2j"
      },
      "outputs": [],
      "source": [
        "import json\n",
        "\n",
        "# Load the responses into a JSON format\n",
        "json_response = json.loads(response)\n",
        "json_response[\"starters\"]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "iyvzh4GE17oC"
      },
      "source": [
        "### Generating visual asset programmatically\n",
        "\n",
        "Using the newly formatted menu, you will be creating a batch of images programatically using Imagen 3. You will use the Gemini 2.0 Flash model to rewrite each description into a detailed image prompt."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "KYlyLJdD_LOC"
      },
      "outputs": [],
      "source": [
        "# convert a description into an image prompt\n",
        "description = json_response[\"starters\"][0][\"description\"]\n",
        "text_prompt = prompt_template.format(image_prompt=description)\n",
        "image_prompt = generate_content(text_prompt)\n",
        "\n",
        "print(f\"DESCRIPTION:\\n  {description}\\n\")\n",
        "print(f\"IMAGE PROMPT:\\n  {image_prompt}\\n\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hmgMzRtfRuZ6"
      },
      "source": [
        "Here you will generate **starters** from the menu"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "k3XlOUFfdG87"
      },
      "outputs": [],
      "source": [
        "for starter in json_response[\"starters\"]:\n",
        "    text_prompt = prompt_template.format(image_prompt=starter[\"description\"])\n",
        "    image_prompt = generate_content(text_prompt)\n",
        "\n",
        "    print(f\"ORIGINAL: {starter['description']}\")\n",
        "    print(f\"IMPROVED: {image_prompt}\")\n",
        "\n",
        "    response = client.models.generate_images(\n",
        "        model=imagen_model,\n",
        "        prompt=image_prompt,\n",
        "        config=types.GenerateImagesConfig(\n",
        "            number_of_images=4,\n",
        "        ),\n",
        "    )\n",
        "\n",
        "    display_images_in_grid(response.generated_images)\n",
        "    print()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "sk0eXjQ1nR4F"
      },
      "source": [
        "## Conclusion\n",
        "\n",
        "Congratulations! You have successfully created a professional restaurant menu with the help of Gemini and Imagen!\n",
        "\n",
        "Imagen on Vertex AI can do much more that generating realistic images. Imagen allows you to edit images, generate captions, ask questions of images, and more. Explore all the features of Imagen [here](https://cloud.google.com/vertex-ai/docs/generative-ai/image/overview)."
      ]
    }
  ],
  "metadata": {
    "colab": {
      "name": "creating_high_quality_visual_assets_with_gemini_and_imagen.ipynb",
      "toc_visible": true
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
